Skip to content

feat(web): tokenize input corrections and provide for multi-token predictions 🚂 🔪#15897

Open
jahorton wants to merge 1 commit into
refactor/web/single-char-correction-penaltyfrom
feat/web/multi-token-predict-core
Open

feat(web): tokenize input corrections and provide for multi-token predictions 🚂 🔪#15897
jahorton wants to merge 1 commit into
refactor/web/single-char-correction-penaltyfrom
feat/web/multi-token-predict-core

Conversation

@jahorton

@jahorton jahorton commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

For this PR's "baby step", the common buildCorrectionSequence method is reworked to use the search result type provided by TokenizationCorrector. Work in the previous PR - #16023 - ensures that even custom and legacy model types can provide data adhering to the same search result type despite not being able to leverage TokenizationCorrector itself.

The methods leveraging buildCorrectionSequence, at this point, still only provide a single token for correction to the method. Baby steps.

Build-bot: skip build:web
Test-bot: skip

@keymanapp-test-bot

keymanapp-test-bot Bot commented Apr 30, 2026

Copy link
Copy Markdown

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

  • Web
    • KeymanWeb Test Home - build : all tests passed (no artifacts on BuildLevel "build")

@keymanapp-test-bot keymanapp-test-bot Bot changed the title feat(web): rework buildAndMapPredictions for multi-token predictions feat(web): rework buildAndMapPredictions for multi-token predictions 🚂 Apr 30, 2026
@keymanapp-test-bot keymanapp-test-bot Bot added this to the A19S28 milestone Apr 30, 2026
@jahorton jahorton force-pushed the feat/web/prep-tokenization-search branch from 5b1c721 to d558c86 Compare May 7, 2026 18:13
@jahorton jahorton force-pushed the feat/web/multi-token-predict-core branch from 6d11477 to fdd65c0 Compare May 7, 2026 18:21
@keyman-server keyman-server modified the milestones: A19S28, A19S29 May 11, 2026
@jahorton jahorton force-pushed the feat/web/prep-tokenization-search branch from d558c86 to 7a6f297 Compare May 15, 2026 18:39
@jahorton jahorton force-pushed the feat/web/multi-token-predict-core branch from 698be9b to a1ae1cb Compare May 15, 2026 19:15
@jahorton jahorton force-pushed the feat/web/prep-tokenization-search branch from 7a6f297 to 221941b Compare May 22, 2026 20:22
@keyman-server keyman-server modified the milestones: A19S29, A19S30 May 23, 2026
@jahorton jahorton force-pushed the feat/web/multi-token-predict-core branch from a1ae1cb to dc77cca Compare May 27, 2026 17:52
@jahorton jahorton changed the title feat(web): rework buildAndMapPredictions for multi-token predictions 🚂 feat(web): tokenize input corrections and provide for multi-token predictions 🚂 May 27, 2026
@jahorton jahorton force-pushed the feat/web/multi-token-predict-core branch from dc77cca to 754d792 Compare May 27, 2026 21:03
@jahorton jahorton changed the base branch from feat/web/prep-tokenization-search to refactor/web/single-char-correction-penalty May 27, 2026 21:03
Comment on lines +619 to +623
const suggestionRange = determineSuggestionRange(transition.base.displayTokenization.tokens, tokenization.tokens, (a, b) => a.spaceId == b.spaceId);
suggestionRange.transitionId = transition.transitionId;
const corrector = new TokenizationCorrector(tokenization, suggestionRange.tokensToPredict.length, () => true);
const predictionPrep = determineTokenizedCorrectionSequence(transition, tokenization, new TokenizationResultMapping([match], corrector));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets the main correction-loop one step closer to integrating multi-token capabilities. It's just a stop-gap in this PR, though.

@jahorton jahorton force-pushed the refactor/web/single-char-correction-penalty branch from f5afb06 to cc902e9 Compare May 28, 2026 21:31
@jahorton jahorton force-pushed the feat/web/multi-token-predict-core branch from 754d792 to f9f9b08 Compare May 28, 2026 21:33
@keymanapp-test-bot keymanapp-test-bot Bot changed the title feat(web): tokenize input corrections and provide for multi-token predictions 🚂 feat(web): tokenize input corrections and provide for multi-token predictions 🚂 🔪 May 28, 2026
@jahorton jahorton force-pushed the refactor/web/single-char-correction-penalty branch from cc902e9 to bf547ba Compare June 1, 2026 21:10
@jahorton jahorton force-pushed the feat/web/multi-token-predict-core branch from f9f9b08 to 25f7f16 Compare June 1, 2026 21:12
@jahorton jahorton force-pushed the refactor/web/single-char-correction-penalty branch from bf547ba to d0e11c9 Compare June 2, 2026 14:27
@jahorton jahorton force-pushed the feat/web/multi-token-predict-core branch from 25f7f16 to 8fa77c0 Compare June 2, 2026 14:28
@keyman-server keyman-server modified the milestones: A19S30, A19S31 Jun 8, 2026
@jahorton jahorton force-pushed the refactor/web/single-char-correction-penalty branch 2 times, most recently from 9b972f4 to fbfd88a Compare June 11, 2026 17:57
@jahorton jahorton force-pushed the feat/web/multi-token-predict-core branch from 8fa77c0 to f30651a Compare June 11, 2026 18:19
@jahorton jahorton force-pushed the refactor/web/single-char-correction-penalty branch from fbfd88a to 0a85adc Compare June 11, 2026 18:26
@jahorton jahorton force-pushed the feat/web/multi-token-predict-core branch 3 times, most recently from 9db4da8 to f7c1bfa Compare June 11, 2026 18:55
…dictions

Build-bot: skip build:web
Test-bot: skip
@jahorton jahorton force-pushed the feat/web/multi-token-predict-core branch from f7c1bfa to 1a7bb31 Compare June 11, 2026 19:09
@jahorton jahorton requested a review from ermshiperete June 11, 2026 19:37
@jahorton jahorton marked this pull request as ready for review June 11, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants